Sq8 to Sq8 dist functions - ip and cosine [MOD-13170] #873

dor-forer · 2025-12-28T09:39:15Z

Describe the changes in the pull request

Added support to SQ8 to SQ8 Ip and Cosine spaces.
Intel:
AVX512_F_BW_VL_VNNI - needs it to support uint8 and in8 operations.
ARM:
SVE
NEON

The cosine functions assume that the vectors are normlized therefore don't divide by the norm.
Which issues this PR fixes

MOD-13170

Main objects this PR modified

...
...

Mark if applicable

This PR introduces API changes
This PR introduces serialization changes

- Implemented inner product and cosine distance functions for SQ8-to-SQ8 vectors in SVE, NEON, and AVX512 architectures. - Added corresponding distance function selection logic in IP_space.cpp and function headers in IP_space.h. - Created benchmarks for SQ8-to-SQ8 distance functions to evaluate performance across different architectures. - Developed unit tests to validate the correctness of the new distance functions against expected results. - Ensured compatibility with existing optimization features for various CPU architectures.

Copilot

Pull request overview

This PR adds support for SQ8-to-SQ8 distance functions for Inner Product and Cosine similarity metrics, where both vectors are scalar-quantized 8-bit representations. This extends the existing SQ8 functionality which only handled float-to-SQ8 comparisons.

Key changes:

Implemented optimized SIMD versions for Intel (AVX512) and ARM (SVE, SVE2, NEON) architectures
Added comprehensive test coverage for all optimization variants
Integrated new benchmark suite for performance testing
Refactored existing helper function names to avoid naming conflicts

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/unit/test_spaces.cpp	Added unit tests for SQ8_SQ8 IP and Cosine functions across all architectures; improved test suite naming consistency
tests/benchmark/spaces_benchmarks/bm_spaces_sq8_sq8.cpp	New benchmark file for SQ8_SQ8 distance functions
tests/benchmark/spaces_benchmarks/bm_spaces_sq8.cpp	Removed empty comment lines (cleanup)
tests/benchmark/benchmarks.sh	Added spaces_sq8_sq8 to benchmark configurations
tests/benchmark/CMakeLists.txt	Added sq8_sq8 data type to build system
src/VecSim/spaces/functions/SVE2.h	Added function declarations for SQ8_SQ8 IP and Cosine implementations
src/VecSim/spaces/functions/SVE2.cpp	Implemented SVE2 function choosers for SQ8_SQ8 distance functions
src/VecSim/spaces/functions/SVE.h	Added function declarations for SQ8_SQ8 IP and Cosine implementations
src/VecSim/spaces/functions/SVE.cpp	Implemented SVE function choosers for SQ8_SQ8 distance functions
src/VecSim/spaces/functions/NEON.h	Added function declarations for SQ8_SQ8 IP and Cosine implementations
src/VecSim/spaces/functions/NEON.cpp	Implemented NEON function choosers for SQ8_SQ8 distance functions
src/VecSim/spaces/functions/AVX512F_BW_VL_VNNI.h	Added function declarations for SQ8_SQ8 IP and Cosine implementations
src/VecSim/spaces/functions/AVX512F_BW_VL_VNNI.cpp	Implemented AVX512 function choosers for SQ8_SQ8 distance functions
src/VecSim/spaces/IP_space.h	Added public API declarations for SQ8_SQ8 distance functions
src/VecSim/spaces/IP_space.cpp	Implemented dispatcher functions that select appropriate SIMD implementation
src/VecSim/spaces/IP/IP_SVE_SQ8_SQ8.h	New file with SVE-optimized SQ8_SQ8 IP and Cosine implementations
src/VecSim/spaces/IP/IP_NEON_SQ8_SQ8.h	New file with NEON-optimized SQ8_SQ8 IP and Cosine implementations
src/VecSim/spaces/IP/IP_AVX512F_SQ8_SQ8_BW_VL_VNNI.h	New file with AVX512-optimized SQ8_SQ8 IP and Cosine implementations
src/VecSim/spaces/IP/IP_AVX512F_SQ8_BW_VL_VNNI.h	Renamed helper function to avoid naming conflicts with new SQ8_SQ8 implementations
src/VecSim/spaces/IP/IP_AVX2_SQ8.h	Renamed helper function to avoid naming conflicts and fixed cosine normalization
src/VecSim/spaces/IP/IP.h	Added function declarations for SQ8_SQ8 base implementations
src/VecSim/spaces/IP/IP.cpp	Implemented baseline SQ8_SQ8 InnerProduct and Cosine functions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/VecSim/spaces/IP/IP_NEON_SQ8_SQ8.h

src/VecSim/spaces/IP/IP_AVX512F_SQ8_SQ8_BW_VL_VNNI.h

codecov · 2025-12-28T09:59:29Z

Codecov Report

❌ Patch coverage is 97.10145% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.02%. Comparing base (cd722bf) to head (2e57cf2).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
src/VecSim/spaces/IP_space.cpp	87.50%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #873      +/-   ##
==========================================
- Coverage   97.03%   97.02%   -0.01%     
==========================================
  Files         126      127       +1     
  Lines        7455     7506      +51     
==========================================
+ Hits         7234     7283      +49     
- Misses        221      223       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…mproving performance

…ions

Copilot

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 12 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/VecSim/spaces/functions/NEON_DOTPROD.cpp

src/VecSim/spaces/IP_space.cpp

tests/unit/test_spaces.cpp

src/VecSim/spaces/IP_space.cpp

src/VecSim/spaces/IP/IP_NEON_DOTPROD_SQ8_SQ8.h

src/VecSim/spaces/functions/NEON.h

tests/unit/test_spaces.cpp

src/VecSim/spaces/IP_space.cpp

src/VecSim/spaces/IP/IP_NEON_DOTPROD_SQ8_SQ8.h

… SQ8-to-SQ8 calculations

… NEON and AVX512 headers

… function

…ocumentation accordingly

…tance assertion tolerance

…onsistency

… using AVX512 VNNI; add benchmarks and tests for new functionality

…pulation

…VE, and AVX512; add corresponding selection functions and update tests for consistency.

…update benchmarks and tests for new functionality

…stance function

…stance function tests

…itional compilation in tests

meiravgri

Please revert changes in files that are not related to this pr (all IP_*_SQ8.h files changes)
consider comparing IP_*UINT8.h optimized implmntation function to the new SQ8_SQ8 to ensure we get the best optimization

meiravgri · 2026-01-04T06:16:08Z

tests/utils/tests_utils.h

+    const auto *pVect1 = static_cast<const uint8_t *>(pVect1v);
+    const auto *pVect2 = static_cast<const uint8_t *>(pVect2v);
+
+    // Extract metadata from the end of vectors (likely already prefetched)


what does it mean "(likely already prefetched)"?

tests/utils/tests_utils.h

meiravgri · 2026-01-04T06:22:09Z

tests/unit/test_spaces.cpp

+    float baseline = SQ8_Cosine(v1_orig.data(), v2_quantized.data(), dim);
+
+#ifdef OPT_SVE2
+    if (optimization.sve2) {
+        unsigned char alignment = 0;
+        arch_opt_func = Cosine_SQ8_GetDistFunc(dim, &alignment, &optimization);
+        ASSERT_EQ(arch_opt_func, Choose_SQ8_Cosine_implementation_SVE2(dim))
+            << "Unexpected distance function chosen for dim " << dim;
+        ASSERT_NEAR(baseline, arch_opt_func(v1_orig.data(), v2_quantized.data(), dim), 0.01)
+            << "SVE2 with dim " << dim;
+        optimization.sve2 = 0;
+    }
+#endif
+#ifdef OPT_SVE
+    if (optimization.sve) {
+        unsigned char alignment = 0;
+        arch_opt_func = Cosine_SQ8_GetDistFunc(dim, &alignment, &optimization);
+        ASSERT_EQ(arch_opt_func, Choose_SQ8_Cosine_implementation_SVE(dim))
+            << "Unexpected distance function chosen for dim " << dim;
+        ASSERT_NEAR(baseline, arch_opt_func(v1_orig.data(), v2_quantized.data(), dim), 0.01)
+            << "SVE with dim " << dim;
+        optimization.sve = 0;
+    }
+#endif
+#ifdef OPT_NEON
+    if (optimization.asimd) {
+        unsigned char alignment = 0;
+        arch_opt_func = Cosine_SQ8_GetDistFunc(dim, &alignment, &optimization);
+        ASSERT_EQ(arch_opt_func, Choose_SQ8_Cosine_implementation_NEON(dim))
+            << "Unexpected distance function chosen for dim " << dim;
+        ASSERT_NEAR(baseline, arch_opt_func(v1_orig.data(), v2_quantized.data(), dim), 0.01)
+            << "NEON with dim " << dim;
+        optimization.asimd = 0;
+    }
+#endif



can we avoid this (and all the following) change that is not related to sq8_sq8?

I think the PR should also handle the fixes needed with regular SQ8, as it was also converted to be used by the disk implementation

we need to implement them for sure, but in a different pr

tests/utils/tests_utils.h

tests/unit/test_spaces.cpp

src/VecSim/spaces/IP/IP_AVX512F_BW_VL_VNNI_SQ8_SQ8.h

src/VecSim/spaces/IP/IP_NEON_DOTPROD_SQ8_SQ8.h

…Calculations - Refactored inner product calculations for SQ8 vectors using NEON and SVE optimizations. - Integrated UINT8_InnerProductImp for efficient dot product computation in NEON and SVE implementations. - Updated inner product functions to handle 64-element chunks for improved performance. - Adjusted distance function selection logic to ensure optimizations are applied only for dimensions >= 16. - Added tests for zero vectors and constant vectors to validate optimized implementations against baseline results. - Ensured consistency in assertions for symmetry tests across various optimization flags. - Improved code readability and maintainability by removing redundant code and comments.

…oint comparison

- Updated inner product functions for NEON, SSE4, and SVE to streamline dequantization and reduce unnecessary calculations. - Consolidated common logic for inner product and cosine calculations across different SIMD implementations. - Enhanced the handling of vector normalization and quantization in unit tests, ensuring consistency in compressed vector sizes. - Adjusted benchmark tests to reflect changes in vector compression and distance function calls. - Corrected include paths for AVX512 implementations to maintain consistency across the codebase.

… compressed size calculations

…ance

… clarity

…proved clarity

meiravgri · 2026-01-04T13:50:22Z

src/VecSim/spaces/IP/IP_AVX512F_BW_VL_VNNI_SQ8_SQ8.h

+float SQ8_SQ8_InnerProductImp(const void *pVec1v, const void *pVec2v, size_t dimension) {
+    // Compute raw dot product using efficient UINT8 AVX512 VNNI implementation
+    // UINT8_InnerProductImp uses _mm512_dpwssd_epi32 for native integer dot product
+    int dot_product = UINT8_InnerProductImp<residual>(pVec1v, pVec2v, dimension);


…ions

meiravgri · 2026-01-04T14:07:40Z

tests/benchmark/spaces_benchmarks/bm_spaces_sq8_sq8.cpp

+// NEON SQ8-to-SQ8 functions
+#ifdef OPT_NEON
+bool neon_supported = opt.asimd;
+INITIALIZE_BENCHMARKS_SET_IP(BM_VecSimSpaces_SQ8_SQ8, SQ8_SQ8, NEON, 16, neon_supported);


Is there a reason that the dim_opt is not aligned with bm_spaces_uint.cpp values?

No, just forgot to change it.

… ARM architecture

…tions

dor-forer added 4 commits December 28, 2025 09:37

Add SQ8-to-SQ8 benchmark tests and update related scripts

8697a3e

Format

e0ce268

Orgnizing

ab6b077

dor-forer requested a review from Copilot December 28, 2025 09:39

dor-forer changed the title ~~Sq8 to Sq8 dist functions - ip and cosine~~ Sq8 to Sq8 dist functions - ip and cosine [MOD-13170] Dec 28, 2025

Copilot started reviewing on behalf of dor-forer December 28, 2025 09:39 View session

Copilot AI reviewed Dec 28, 2025

View reviewed changes

src/VecSim/spaces/IP/IP_NEON_SQ8_SQ8.h Outdated Show resolved Hide resolved

src/VecSim/spaces/IP/IP_AVX512F_SQ8_SQ8_BW_VL_VNNI.h Outdated Show resolved Hide resolved

Add full sq8 bencharks

931e339

dor-forer added 4 commits December 28, 2025 12:05

Optimize the sq8 sq8

a56474d

Optimize SQ8 distance functions for NEON by reducing operations and i…

a25f45c

…mproving performance

format

0ad941e

Add NEON DOTPROD-optimized distance functions for SQ8-to-SQ8 calculat…

68cd068

…ions

dor-forer requested a review from Copilot December 28, 2025 13:49

Copilot started reviewing on behalf of dor-forer December 28, 2025 13:50 View session

Copilot AI reviewed Dec 28, 2025

View reviewed changes

PR

0b4b568

dor-forer requested a review from meiravgri December 28, 2025 14:28

dor-forer added 11 commits December 28, 2025 16:30

Remove NEON DOTPROD-optimized distance functions for INT8, UINT8, and…

d0fd2e4

… SQ8-to-SQ8 calculations

Fix vector layout documentation by removing inv_norm from comments in…

9de6163

… NEON and AVX512 headers

Remove 'constexpr' from ones vector declaration in NEON inner product…

63a46a1

… function

Refactor distance functions to remove inv_norm parameter and update d…

525f8da

…ocumentation accordingly

Update SQ8 Cosine test to normalize both input vectors and adjust dis…

13a477b

…tance assertion tolerance

Rename 'compressed' to 'quantized' in SQ8 functions for clarity and c…

c18000e

…onsistency

Implement SQ8-to-SQ8 distance functions with precomputed sum and norm…

bbf810e

… using AVX512 VNNI; add benchmarks and tests for new functionality

Add edge case tests for SQ8-to-SQ8 precomputed cosine distance functions

dbbb7d9

Refactor SQ8 test cases to use CreateSQ8QuantizedVector for vector po…

36ab068

…pulation

Implement SQ8-to-SQ8 precomputed distance functions using ARM NEON, S…

00617d7

…VE, and AVX512; add corresponding selection functions and update tests for consistency.

Implement SQ8-to-SQ8 precomputed inner product and cosine functions; …

4331d91

…update benchmarks and tests for new functionality

dor-forer added 2 commits January 4, 2026 09:10

Add CPU feature checks to disable optimizations for AArch64 in SQ8 di…

f28f4e7

…stance function

Add CPU feature checks to disable optimizations for AArch64 in SQ8 di…

e50dc45

…stance function tests

dor-forer requested a review from meiravgri January 4, 2026 07:14

Fix formatting issues in SQ8 inner product function and clean up cond…

6bbbc38

…itional compilation in tests

meiravgri reviewed Jan 4, 2026

View reviewed changes

dor-forer added 3 commits January 4, 2026 13:54

Fix header guard duplication and update test assertion for floating-p…

d7972e9

…oint comparison

Add missing pragma once directive in NEON header files

a8075bf

dor-forer requested a review from meiravgri January 4, 2026 12:09

dor-forer added 5 commits January 4, 2026 14:54

Update SQ8 vector population functions to include metadata and adjust…

4f0fec7

… compressed size calculations

Refactor SQ8 inner product functions for improved clarity and perform…

8ab4192

…ance

Rename inner product implementation functions for AVX2 and AVX512 for…

8c59cb2

… clarity

Refactor SQ8 cosine function to utilize inner product function for im…

a4ff5d0

…proved clarity

meiravgri reviewed Jan 4, 2026

View reviewed changes

dor-forer added 2 commits January 4, 2026 15:56

Remove redundant inner product edge case tests for SQ8 distance funct…

c22158f

…ions

Add SVE2 support to SQ8-to-SQ8 Inner Product distance function

4c19d9e

meiravgri reviewed Jan 4, 2026

View reviewed changes

dor-forer added 2 commits January 4, 2026 16:15

Remove SVE2 and other optimizations from SQ8 cosine function test for…

5c22af8

… ARM architecture

Update NEON benchmarks to use a vector size of 64 for SQ8-to-SQ8 func…

9e50d7c

…tions

dor-forer requested a review from meiravgri January 4, 2026 15:13

meiravgri previously approved these changes Jan 4, 2026

View reviewed changes

meiravgri added the bm-spaces-sq8-full label Jan 4, 2026

Increase allocated space for cosine calculations in SQ8 benchmark setup

2e57cf2

dor-forer dismissed meiravgri’s stale review via 2e57cf2 January 4, 2026 17:09

meiravgri approved these changes Jan 4, 2026

View reviewed changes

dor-forer added this pull request to the merge queue Jan 4, 2026

Merged via the queue into main with commit b0fd737 Jan 4, 2026
24 checks passed

dor-forer deleted the dorer-sq8-dist-functions-ip-cosine branch January 4, 2026 18:28

Sq8 to Sq8 dist functions - ip and cosine [MOD-13170] #873

Sq8 to Sq8 dist functions - ip and cosine [MOD-13170] #873

Uh oh!

Conversation

dor-forer commented Dec 28, 2025 • edited by atlassian bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

meiravgri left a comment

Choose a reason for hiding this comment

Uh oh!

meiravgri Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

meiravgri Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

dor-forer Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

meiravgri Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

meiravgri Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

meiravgri Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

dor-forer Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dor-forer commented Dec 28, 2025 •

edited by atlassian bot

Loading

codecov bot commented Dec 28, 2025 •

edited

Loading

dor-forer Jan 4, 2026 •

edited

Loading